An auto-redaction is a multi-step process that finds matches of a given regular expression, then permanently removes the text and blacks out the displayed region for each match. The end result is a new PDF document with no traces of the text that matched the regular expression.
The following steps will walk you through the auto-redaction process using the PrizmDoc Backend RESTful Services.
Step 1: Upload Your Source Document
- Upload the source document that you want to redact.
- This can be a document of any format supported by the PrizmDoc Backend RESTful Services, except for DICOM documents which are not currently supported for redaction.
- In response to this request you will receive a file ID that is used to reference the source document in later requests.
Example |
Copy Code
|
POST http://192.168.0.1:18681/PCCIS/V1/WorkFile?FileExtension=pdf
Content-Type: application/octet-stream
[binary data]
200 OK
Content-Type: application/json
{
"fileId": "5qTYa3gzN9gYUb5SzqUhqg",
}
|
Step 2: Compose a Regular Expression
- Compose the regular expression that will match the text you want to redact in the document.
- The regular expression should adhere to the POSIX extended RE (ERE) or basic RE (BRE) syntax. (See details in this link: http://laurikari.net/tre/documentation/regex-syntax/)
- For example, the following regular expression will redact all US Social Security Numbers in a document:
Example |
Copy Code
|
"[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"
|
|
Note that the regular expression is sent to PrizmDoc in JSON format, so you should adjust the regular expression according to JSON syntax. Specifically, the backslash symbol should be duplicated. |
|
If you create regular expressions programmatically, using string literals, you may need to further adjust the string according to the programming language syntax. |
Example |
Copy Code
|
Regular expression (searches whole word "the", case insensitive):
"(?i:\bthe\b)"
JSON content:
"regex": "(?i:\\bthe\\b)";
C# code:
string regex = "(?i:\\\\bthe\\\\b)";
|
Step 3: Create Markup XML from the Regular Expression
Before the actual redaction process can be started, the regular expression needs to be converted to a format it can understand. PrizmDoc uses a proprietary XML syntax to define markups used for redaction, which you can generate by sending a POST request which requires two inputs:
- The file ID of source document you uploaded in Step 1, and
- The regular expression you created in Step 2.
Example |
Copy Code
|
POST http://192.168.0.1:18681/PCCIS/V1/RedactionCreator
Content-Type: application/json
{
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"autoRedactionRegularExpressions": ["[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"]
}
}
200 OK
Content-Type: application/json
{
"processId": "Rr64ma-U_HseoPrs6y0iiw",
"expirationDateTime": "2014-12-03T18:30:49.460Z",
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"autoRedactionRegularExpressions": ["[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"]
},
"state": "processing",
"percentComplete": 0
}
|
Step 4: Check Status of the RedactionCreator Resource
- The process to generate markup XML runs asynchronously on the PrizmDoc server. The POST request you sent in Step 3 will return immediately and before the output is ready. This means you will need to check the status of the process by sending a GET request to the resource you just created.
- In response to this request, JSON will be returned that includes a "state" property. When this property is "complete", the JSON response will also include an "output" property which means you can proceed to the next step.
- See the Redaction Creator API for more details of this request.
Example |
Copy Code
|
GET http://192.168.0.1:18681/PCCIS/V1/RedactionCreator/Rr64ma-U_HseoPrs6y0iiw
200 OK
Content-Type: application/json
{
"processId": "Rr64ma-U_HseoPrs6y0iiw",
"expirationDateTime": "2014-12-03T18:30:49.460Z",
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"autoRedactionRegularExpressions": ["[0-9]{3}[-]?[0-9]{2}[-]?[0-9]{4}"]
},
"state": "complete",
"percentComplete": 100,
"output": {
"markupFileId": "o1bLJwFGxf9QGuTkyrOqig"
}
}
|
Step 5: Start the Markup Burning Process (Redaction)
- Using the file IDs you obtained for the source document in Step 1 and the XML markup file in Step 4, you can now start the process to redact the document. This is accomplished by sending a POST request which will start a process that runs asynchronously on the PrizmDoc server to produce a redacted document.
Example |
Copy Code
|
POST http://192.168.0.1:18681/PCCIS/V1/MarkupBurner
Content-Type: application/json
{
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"markupFileId": " o1bLJwFGxf9QGuTkyrOqig"
}
}
200 OK
Content-Type: application/json
{
"processId": "bQpcuixhvGmNqn5ElskO6Q",
"expirationDateTime": "2014-12-03T18:30:49.460Z",
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"markupFileId": " o1bLJwFGxf9QGuTkyrOqig"
},
"state": "processing",
"percentComplete": 0
}
|
Step 6: Check Status of the MarkupBurner Resource
- The process to generate a redacted document runs asynchronously on the PrizmDoc server. The POST request you sent in Step 5 will return immediately and before the output is ready. This means you will need to check the status of the process by sending a GET request to the resource you just created.
- In response to this request, JSON will be returned that includes a "state" property. When this property is "complete", the JSON response will also include an "output" property which means you can proceed to the next step.
- See the Markup Burner API for more details of this request.
Example |
Copy Code
|
GET http://192.168.0.1:18681/PCCIS/V1/MarkupBurner/ bQpcuixhvGmNqn5ElskO6Q
200 OK
Content-Type: application/json
{
"processId": " bQpcuixhvGmNqn5ElskO6Q ",
"expirationDateTime": "2014-12-03T18:30:49.460Z",
"input": {
"documentFileId": "5qTYa3gzN9gYUb5SzqUhqg",
"markupFileId": " o1bLJwFGxf9QGuTkyrOqig"
},
"state": "complete",
"percentComplete": 100,
"output": {
"documentFileId": "5ufb3ytUb1BxxgSUAk_G9Q"
}
}
|
Step 7: Download the Redacted Document
- Once the markup burning process completes successfully, the new, redacted PDF document is available for download.
Example |
Copy Code
|
GET http://192.168.0.1:18681/PCCIS/V1/WorkFile/5ufb3ytUb1BxxgSUAk_G9Q
200 OK
Content-Type: application/pdf
[binary data]
|